Full support for DB numbers other than zero in cluster mode. #410

allenss-amazon · 2025-10-09T14:52:43Z

fixes #204

allenss-amazon · 2025-10-14T00:28:31Z

Fixes issue #204

Signed-off-by: Allen Samuels <[email protected]>

murphyjacob4 · 2025-10-15T08:08:41Z

src/schema_manager.h

+The SchemaManager uses the MetadataManager to manage a single name space of
+indexes, i.e., map<string, GlobalMetadataEntry>.


Appreciate the thought going into this. The interoperability is a headache

One question: when someone creates an index in db_num > 0 on Valkey 9, how would a shard running on Valkey 8 behave? I am guessing it would crash, right? I guess that behavior is fine, but we should document this

Another option I guess is that we have namespaces in the metadata manager. Right now we use "vs_index_schema". We could move to "vs_index_schema2"? Nodes on the old versions would just not get the index definitions for those created with vs_index_schema2, which should be fine, and we can register for callbacks of "vs_index_schema" in the new node for backwards compat. Probably a bit more work, but it would prevent the Valkey 8.0 nodes from crashing in such cases

Signed-off-by: Allen Samuels <[email protected]>

allenss-amazon · 2025-10-24T05:01:52Z

Revised to reject RDB or cross-shard metadata that contains any object with a semantic version higher than what's understood by the current code.

Signed-off-by: Allen Samuels <[email protected]>

murphyjacob4 · 2025-11-03T18:26:01Z

src/coordinator/metadata_manager.cc

+The mapping is done by manipulating the object name. It is known that
+pre-Valkey 9 object names cannot contain a valid hash-tag. This provides the
+necessary information to distinguish between an object name and an encoded
+<db_num, object_name> pair. Objects with a db_num of 0 are stored the same
+way as pre-Valkey 9 objects, providing backward compatibility.


Now that we have the encoding version added - could we just add a new field to the IndexSchema proto (e.g. clustered_db_num)? Old versions would silently drop it, but newer versions would support it.

This way, db_num can be removed from the metadata manager entirely (not encoded in the name and not added as a middle layer of the table).

That would disallow two indexes in different db's to have the same name.

The underlying problem is that MetadataManager has a two-level naming scheme and now we need a three-level naming scheme. Plus, this stuff is preserved in the RDB file, so there's a backward compatibility issue. We can certainly move to a three-level scheme with conversion from old code. But that's a lot of changes to MetadataManager.

Another scheme would be to reuse the top-level naming for each DB. That's a change in the philosophy of MetadataManager as it would require dynamic registration of the type level types.

I see, yeah that makes sense, was missing that part.

If I were redesigning it - the metadata entry would be indexed by a UUID instead of just the index name. The UUID could be computed by the original creator.

Is there some way we could move to a UUID based scheme now? I see two ways:

Leave the existing metadata type namespace, and just change the schema name to be UUID indexed:

If we assume the new UUIDs are unique (they should be, or at least have an astronomically low chance of collision) then they should be able to merge cleanly with the existing IDs. We would just make the "ID" for metadata manager work like: DB!=0 -> UUID, DB==0 -> index name

Reading the code, the only place the ID is used in the Schema Manager is we use this to delete the existing index before processing the newer version of the index. So it shouldn't cause a crash. If you do create an index on DB!=0 when the cluster has nodes on the older version of Valkey-Search, it should work okay actually. If you later went to delete it, it would skip the delete since it would not find the index by the ID (which is now a UUID), which I guess is okay since it isn't even accessible on that node (it is just wasted resources). If you were to recreate it - it would log an error and fail to process - which I also think is okay since it is inaccessible on that node. Once you upgrade to the latest version, it should be able to handle the metadata (maybe we need to add some MetadataManager <-> SchemaManager reconciliation on load)

Add a new metadata type namespace for UUID-indexed:

This would separate the two indexing schemes into two high level types, the old one, which is used for DB=0, and the new one, which is used for DB!=0.

Nodes on the old versions would just not apply the metadata for unknown types. So this works quite cleanly. Once they upgrade, ideally they would then apply the metadata

What do you think? 1 is less clean on the older version nodes, but this already feels like an edge case. As long as we can get it to not crash and it would eventually recover when fully upgraded, I am okay with some odd behaviors on the old nodes. But 2 should be quite clean, I just don't know if we want to support this two-type setup long term (it is complexity that we later need to clean up)

Today, the MetadataManager takes care of resolving consistency issues. In order to support UUID naming, it would have to have some way to determine if a newly received UUID conflicts with some existing UUID. That would require that registered types have some mechanism that would assist this. In other words a client would be queried with a new protobuf and would have to return some UUID that conflicted with it. That seems worse to me, the value of MetadataManager is that he understands the objects and performs collision resolution.

Here, the problem is fundamentally naming. If we confine the discussion to one registered type, then the current system has a flat namespace. I think you either redesign MetadataManager to have a more hierarchical namespace OR you encode the hierarchy into the existing flat namespace. I've chosen the second path -- because I believe you were on the right path before, but the addition of the dbnum, was just unexpected. (BTW, we'll face exactly the same issues when we implement aliasing -- which I plan to do soon).

It is fundamentally about deriving some "primary key" from the index metadata that we can use to uniquely identify an index. It seems like a fundamental inflexibility that the primary key cannot be updated. That's fine as long as we are sure it will never really update. If dbnum is some once-in-a-project problem, then I am okay with what you have proposed.

But - if we need to add another field to the primary key - it seems like we won't be able to take the same path out again, right?

How about this: could we use a proto for the primary key?

message IndexMetadataKey { string name = 1; int db_num = 2; ... }

So we would then serialize this and use this as the key. If we ever add a new field to the key, it would be cross version compatible as long as the default value for that field is handled. It is a similar option to UUID, but it guarantees two semantically identically indexes collide and are reconciled by the metadata manager

allenss-amazon force-pushed the dbnum branch from 5e8f2b4 to bc0f77e Compare October 14, 2025 00:19

allenss-amazon changed the title ~~Initial IndexName encode/decode~~ Full support for DB numbers other than zero in cluster mode. Oct 14, 2025

allenss-amazon marked this pull request as ready for review October 14, 2025 00:33

allenss-amazon requested review from yairgott and zvi-code October 14, 2025 00:34

allenss-amazon added 7 commits October 14, 2025 04:15

Initial IndexName encode/decode

f139ba6

Signed-off-by: Allen Samuels <[email protected]>

working now

dde0743

Signed-off-by: Allen Samuels <[email protected]>

format

85f9890

Signed-off-by: Allen Samuels <[email protected]>

formatting, remove unneded logs

5a02d67

Signed-off-by: Allen Samuels <[email protected]>

spelling

a368b57

Signed-off-by: Allen Samuels <[email protected]>

formatting

b0344fe

Signed-off-by: Allen Samuels <[email protected]>

Update for aggregation

c661d83

Signed-off-by: Allen Samuels <[email protected]>

allenss-amazon force-pushed the dbnum branch from ee1885a to c661d83 Compare October 14, 2025 04:50

murphyjacob4 reviewed Oct 15, 2025

View reviewed changes

allenss-amazon added 9 commits October 19, 2025 16:45

Rewrite per feedback, move encoding into metadata_manager

d9a2027

Signed-off-by: Allen Samuels <[email protected]>

Merge branch 'main' into dbnum

f781eaf

fix merge errors

37eb690

Signed-off-by: Allen Samuels <[email protected]>

Revise per-object Encoding Version

e16426e

Signed-off-by: Allen Samuels <[email protected]>

fix integration test

bc81a74

Signed-off-by: Allen Samuels <[email protected]>

Enhance metadata tests

dfd6753

Signed-off-by: Allen Samuels <[email protected]>

Update semantic versioning

dc05b01

Signed-off-by: Allen Samuels <[email protected]>

Merge branch 'main' into dbnum

ccb71b7

Implement schema-level versioning

cd2a60e

Signed-off-by: Allen Samuels <[email protected]>

allenss-amazon added 4 commits October 24, 2025 05:02

include omitted test file

e76d501

Signed-off-by: Allen Samuels <[email protected]>

fix valkey version post GA

23c307f

Signed-off-by: Allen Samuels <[email protected]>

remove old drop option

4dc92c1

Signed-off-by: Allen Samuels <[email protected]>

Merge branch 'main' into dbnum

0e7c9f7

murphyjacob4 reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Full support for DB numbers other than zero in cluster mode. #410

Full support for DB numbers other than zero in cluster mode. #410

Uh oh!

allenss-amazon commented Oct 9, 2025 •

edited

Loading

Uh oh!

allenss-amazon commented Oct 14, 2025

Uh oh!

murphyjacob4 Oct 15, 2025

Uh oh!

allenss-amazon commented Oct 24, 2025

Uh oh!

murphyjacob4 Nov 3, 2025

Uh oh!

allenss-amazon Nov 3, 2025

Uh oh!

allenss-amazon Nov 3, 2025

Uh oh!

allenss-amazon Nov 3, 2025

Uh oh!

murphyjacob4 Nov 3, 2025

Uh oh!

allenss-amazon Nov 3, 2025 •

edited

Loading

Uh oh!

murphyjacob4 Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		The SchemaManager uses the MetadataManager to manage a single name space of
		indexes, i.e., map<string, GlobalMetadataEntry>.

Full support for DB numbers other than zero in cluster mode. #410

Are you sure you want to change the base?

Full support for DB numbers other than zero in cluster mode. #410

Uh oh!

Conversation

allenss-amazon commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenss-amazon commented Oct 14, 2025

Uh oh!

murphyjacob4 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

allenss-amazon commented Oct 24, 2025

Uh oh!

murphyjacob4 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

murphyjacob4 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

murphyjacob4 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allenss-amazon commented Oct 9, 2025 •

edited

Loading

allenss-amazon Nov 3, 2025 •

edited

Loading